Methodology For Developing Empirical Models of TCP - Based Applications * ( Extended

نویسندگان

  • F. Hernández Campos
  • K. Jeffay
  • F. D. Smith
چکیده

We report on a large-scale empirical study to create application-level models for TCP traffic generation in simulations and network test-beds. A novel aspect of the study is the development of a method to construct empirical application-level models for arbitrary client/server (request/response) application-level protocols based on an analysis of unidirectional traces of only the TCP/IP headers in the traffic generated by the applications. By analyzing large collections of application-specific traces from a number of sources, we are producing empirical distributions to populate stochastic models of HTTP, FTP, and SMTP. However, combined, these applications represent less than 50% of the observed TCP traffic on our campus. We are also investigating the use of cluster analysis to identify sets of statistically homogenous TCP connection traces. Once these clusters have been identified we can then model each cluster by fitting stochastic models to the set of TCP connections identified as belonging to that cluster. 1 1. Background and Motivation A critical component of Internet simulations and test-bed measurements is the generation of synthetic traffic. Floyd and Paxson [12] provide an excellent analysis of the issues and pitfalls, especially traffic-generating models that are based on empirical data that has already been “shaped” by network influences. They conclude, “... if we take care to use traffic traces to characterize source behavior, rather than packet-level behavior, we can use the source-level descriptions in simulations to synthesize plausible traffic.” In particular, for TCP-based applications, TCP’s end-to-end congestion control (perhaps influenced by router-based mechanisms such as RED packet drops or ECN markings) shapes the low-level packet-by-packet traffic processes. Thus the generation of TCP traffic must be accomplished by using application-dependent but network-independent traffic sources layered over (real or simulated) TCP implementations. Two important pre-Web measurement efforts that produced application-dependent traffic models (Telnet, FTP, NNTP, SMTP), were conducted by Danzig et al., [5, 1Student author. *This work supported in parts by grants from the National Science Foundation (grants CDA-9624662, ITR-0082870, and ITR-0082866), the Cisco, IBM, Intel, Sun, Cabletron, and Aprisma corporations, and NCNI. 10, 11], and by Paxson [15]. Web traffic generators in use today are usually based on data from two seminal measurement projects that focused on capturing web-browsing source behaviors: the Mah [13], and Barford, Crovella, et al., [1, 3, 8, 9] studies. Traffic generators based on these sources have been built into the widely used ns network simulator [4] and have also been used to generate traffic in laboratory test-bed networks [2, 6]. Constructing traffic generators for TCP applications depends ultimately on the availability of high-quality Internet measurement data from a variety of sites that can be used to obtain characteristics of source (application-level) behavior. Our research is concerned with capturing and analyzing large-scale collections of Internet traces of TCP/IP protocol headers to create contemporary source-level models for traffic generation. We are further concerned with doing this with low-cost, low-overhead methods that can be used at multiple sites as an ongoing effort to create up-to-date models because Internet applications (and their use) continue to evolve rapidly. We are currently working with traces of TCP/IP packet headers from two sources – (1) a collection we obtained by placing a network monitor on the Gigabit Ethernet link connecting the University of North Carolina at Chapel Hill (UNC) campus network to the Internet, and (2) a sample of traces taken from the collection maintained by NLANR/MOAT [18]. The UNC collection consists of traces taken in September 1999 (42 one-hour traces taken at six intervals on each of seven days), October 2000 (42 onehour traces taken at six intervals on each of seven days), and April 2001 (21 four-hour traces taken at three intervals on each of seven days). The aggregate size of these TCP/IP headers is about 500GB. The NLANR/MOAT traces are from a number of sites and have been collected regularly over several years. Most of these traces only cover 90second intervals and present problems in trying to capture very large data objects. We have, however, recently obtained traces from the NLANR/MOAT Auckland-IV collection [19] that are 24-hour traces taken at the University of Auckland by the WAND research group [20] during April 2001.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An ANOVA Based Analytical Dynamic Matrix Controller Tuning Procedure for FOPDT Models

Dynamic Matrix Control (DMC) is a widely used model predictive controller (MPC) in industrial plants. The successful implementation of DMC in practical applications requires a proper tuning of the controller. The available tuning procedures are mainly based on experience and empirical results. This paper develops an analytical tool for DMC tuning. It is based on the application of Analysis of V...

متن کامل

Empirically-Derived Analytic Models of Wide-Area TCP Connections: Extended Report

We analyze 2.5 million TCP connections that occurred during 14 wide-area traffic traces. The traces were gathered at five “stub” networks and two internetwork gateways, providing a diverse look at wide-area traffic. We derive analytic models describing the random variables associated with telnet, nntp, smtp, and ftp connections, and present a methodology for comparing the effectiveness of the a...

متن کامل

AN EXTENDED FUZZY ARTIFICIAL NEURAL NETWORKS MODEL FOR TIME SERIES FORECASTING

Improving time series forecastingaccuracy is an important yet often difficult task.Both theoretical and empirical findings haveindicated that integration of several models is an effectiveway to improve predictive performance, especiallywhen the models in combination are quite different. In this paper,a model of the hybrid artificial neural networks andfuzzy model is proposed for time series for...

متن کامل

Factors Influencing Citizen’s Uasge of E- Government Services in Developing Countries: the Case of Egypt

Today most governments across the globe encourage initiating contact through the Internet to provide citizens with more convenient and accessible public services. This channel is known as government-to-citizen electronic government application (G2C e-government). Although studies in e-government are increasing in number, related models offered in the academic literature are mainly conceptual; w...

متن کامل

Developing a Suitable Model for Supplier Selection Based on Supply Chain Risks: An Empirical Study from Iranian Pharmaceutical Companies

The supply chain represents the critical link between the development of new product and the market in pharmaceutical industry. Over the years, improvements made in supply chain operations have focused largely on ways to reduce cost and gain efficiencies in scale. In addition, powerful regulatory and market forces have provided new incentives for pharmaceutical firms to basically rethink the wa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001